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ABSTRACT 

Power spectrum estimation and evaluation of associated errors in the presence 
of incomplete sky coverage; non-homogeneous, correlated instrumental noise; and 
foreground emission is a problem of central importance for the extraction of cos- 
mological information from the cosmic microwave background. We develop a 
Monte Carlo approach for the maximum likelihood estimatation of the power 
spectrum. The method is based on an identity for the Bayesian posterior as a 
marginalization over unknowns, and maximization of the posterior involves the 
computation of expectation values as a sample average from maps of the cosmic 
microwave background and foregrounds given some current estimate of the power 
spectrum or cosmological model, and some assumed statistical characterization 
of the foregrounds. Maps of the CMB and foregrounds are sampled by a lin- 
ear transform of a Gaussian white noise process, implemented numerically with 
conjugate gradient descent. For time series data with N t samples, and N pixels 
on the sphere, the method has a computational expense KO[N 2 + Nt log Nt], 
where K is a pref actor determined by the convergence rate of conjugate gradient 
descent. Preconditioners for conjugate gradient descent are given for scans close 
to great circle paths, and the method allows partial sky coverage for these cases 
by numerically marginalizing over the unobserved, or removed, region. 

Subject headings: cosmic microwave background - methods: statistical 



1. Introduction 

Power spectrum estimation and evaluation of the associated errors in the presence of 
incomplete sky coverage; non-homogeneous, correlated instrumental noise; and foreground 
emission is a problem of central importance for the extraction of cosmological information 
from the cosmic microwave background. From a Bayesian point of view, power spectrum 
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estimation involves the maximization of the posterior probability density, with error bars 
given by the set of cosmological parameters or power spectrum whose integrated posterior 
density achieves some specified level of confidence. A Bayesian approach to CMB analysis 
for large data sets involving a direct evaluation of the likelihood is intractable due to the 
0(N 3 ) expense associated with computing the inverse of non-sparse matrices, or their deter- 
minants ( (Borrill 1999), (Bond et al. 1999) ). The goal of this paper is the development of 
alternative numerical methods, specifically Monte Carlo techniques, for the Bayesian anal- 
ysis of the CMB, including the complications of incomplete sky coverage, correlated noise 
and foregrounds. 

Previous work has demonstrated that for a certain class of scanning strategies, the sig- 
nal and inverse noise matrices are block diagonal. The block diagonal properties of these 
matrices give an exact 0(N 2 ) Bayesian method, and therefore tractable for data sets as 
large as will be returned from Planck. The complications of this method are that it can- 
not easily accommodate partial sky converge or precessing scan strategies. The method 
of (Oh et al. 1999 ) computes the maximum of the likelihood through a Newton Raphson 
method. The numerical innovations of this method involve Monte Carlo simulations and 
the use of conjugate gradient descent, giving an overall expense 0(N 2 ). The method was 
proposed and numerically demonstrated in the context of uncorrelated noise, and a region of 
sky coverage of azimuthal symmetry, where a good preconditioner can be constructed. How- 
ever, the algorithm is in fact more general, provided there is sufficient memory for storage 
of the needed matrices, and that conjugate gradient descent converges quickly enough (i.e. 
there is a good preconditioner). 

As suggested in (Wandelt and Hansen 2001 ), we can use the ring set approach to supply 
preconditioners. An outstanding problem to be solved is a way of retaining the mathematical 
advantages of a ring set scan (block diagonal inverse noise and signal matrices) while accom- 
modating partial sky coverage. The approach formulated in this paper handles the problem 
of partial sky coverage by embedding the data in an azimuthally symmetric region of sky, 
and using a Monte Carlo Markov Chain to numerically marginalize over the unobserved part. 
For scans close to ring sets, we therefore inherit good preconditioners, allowing an extension 
of both the ring set and conjugate gradient methods to scan strategies as planned for Planck. 

For observations y — s + f + rj, where (s, f, r/) are the CMB signal, foregrounds, and 
noise respectively, our approach to power spectrum estimation is motivated by the identity 
(derived in Appendix A.l ) 



where T is any parameterization of conclusions (such as the power spectrum or cosmological 
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parameters), and r is any fixed guess. The Bayesian posterior ratio on the left is given as 
an integral over the unknown quantities which are assumed to generate the observed data. 
Maximization of the posterior involves computing the gradient of equation 1 which will be 
shown to depend on the expectation value of the power spectrum with respect to the random 



We maximize the posterior ratio in equation 1 by the expectation maximization algorithm 
( (Dempster et al. 1977) ) which proceeds by iteratively setting C;(T„ + i) = E[Ci(s)\T n ,y]. 
The algorithm converges to the posterior maximum for a uniform prior, and gives an un- 
biased, consistent estimator (see Appendix B ). In this paper, we focus on computation of 
the expectation value of the power spectrum E[Ci(s)\T n , y] given the data and some guess 
T n under the assumption of perfect foreground separation (although we comment on how 
the approach can be generalized to include foregrounds later in the paper, and leave its 
numerical demonstration for future work). 

We compute the expectation value of the power spectrum E[Ci(s)\T n , y] numerically 
with a Monte Carlo approach, where we sample maps of the CMB from the probability 
density p(s\f, y, To). Conditioning on some estimate of the foregrounds, the method exploits 
the fact that p(s\f,T ,y) is a Gaussian random field, and therefore completely characterized 
by the mean field map and covariance matrix of fluctuations about that map. Maps are 
sampled from p(s\f,T ,y) by first computing the mean field map with conjugate gradient 
descent, and then sampling fluctuations about the mean field map from a zero mean Gaussian 
field with covariance matrix (N^ 1 + C~ 1 )~ 1 (where N~ l is the inverse noise matrix, and 
is the inverse covariance matrix for the CMB). These fluctuation maps are sampled by a 
linear transformation, numerically computed with conjugate gradient descent, of a spatial 
white noise Gaussian process thereby generating maps with all the same statistical properties 
as samples from p(s\f, T , y). 

Each step of conjugate gradient descent involves a multiplication by the matrix / + 
CiV -1 , which can be done very quickly by multiplication by iV _1 in the basis in which it is 
diagonal, followed by a transform to the spherical harmonic basis where C is diagonal. For 
spatially uncorrelated noise and circularly symmetric beams, we only need to transform from 
the pixel to the spherical harmonic domain, with an expense 0(N 3 ^ 2 ) (Oh et al. 1999 ). In 
order to accommodate spatially correlated noise, we transform to the time domain, followed 
by a transform to the spherical harmonics, giving an expense K[0(N 3 ^ 2 )+N t log N t ] where N t 
is the number of time samples, and K depends on the convergence rate of conjugate gradient 
descent. Including the full complications of asymmetric beams, we would need to compute a 
convolution on the sphere. Using the convolution method of ( (Wandelt and Gorski 2000 )), 



field p(s,f\r ,y), 
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the expense of our method is K[0(N 2 ) + N t \ogN t ]. 

The computational feasibility of this method is limited by finding a numerical imple- 
mentation of conjugate gradient descent which converges quickly so that the prefactor K 
above is small. The strategy here is to embed the data in a region covered by an exact 
ring set scan, following the intuition that good preconditioners can be constructed for scan 
strategies close to ring sets (Wandelt and Hansen 2001 ). Embedding the data in a region 
on the sky with no observations (or where they have been removed) is accommodated by 
numerically marginalizing over the missing observations. Moreover, the same techniques 
can be used to marginalize over the foregrounds, and provide Monte Carlo estimates of the 
confidence intervals for cosmological parameters. 

The paper is organized as follows. We first review complications with a direct computa- 
tion of the likelihood, and provide an overview of our approach. We then discuss a technique 
we call transformed white noise sampling, which allows us to sample maps representing fluc- 
tuations about the mean field map for some guess of the power spectrum. We demonstrate 
the method with a flat sky 512 x 512 test case, including incomplete sky coverage, with 
uncorrelated, non-homogeneous noise. We close with a discussion of further complications 
encountered in real CMB experiments, and how they can be accommodated in the framework 
presented here. 



We begin with a brief review of the likelihood and complications with its computational 
evaluation. The data returned from an experiment is a vector in the time domain d(t), which 
is related to the CMB signal on the sky s(n) through some linear mapping and additive 
Gaussian noise, 



where B(n',n") is the beam of the instrument, and r)(t) is Gaussian noise, assumed to be 
stationary with a noise correlation matrix N(t, t') = N(t — t'). Denoting the linear mapping 
from time domain to the sky as 



2. Power Spectrum Estimation 



2.1. 



Likelihood 




(3) 




(4) 
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we will simply write d — Rs + rj. The likelihood for the power spectrum C\ given the data is 

p(d\d) OC J ds e -{d-Rs)N-\d-Rs) e -sC^s (5) 

Any linear transformation of the data vector will generate a Gaussian form for the likeli- 
hood (as reviewed in Appendix A. 2 ). For example, we can transform the data to an esti- 
mate of the CMB map s = (R T N^R^R? N^d (as discussed in detail in (Tegmark 1997), 
(Stompor et al. 2002) ). The covariance matrix of this map is 

E[s®s] = C +{R T N~ 1 R)- 1 (6) 

which shows that the map s can be thought of as signal C with additive Gaussian noise with 
covariance matrix (R T Nf 1 R)~ 1 . The likelihood can can equivalently be written in terms of 
the map s(d) according to 

\ogp[s(d)\Q} = -si^^N^R)- 1 + C]- 1 s(d)-\ogdet[{R T N- 1 R)- 1 + C] (7) 

We can also write the likelihood in the original time domain, 

\ogp(d\Q) = -d(N + RCR T )- 1 d - logdet[iV + RCR T ] (8) 

Directly evaluating the likelihood in either the spatial or time domain results in an 0(N 3 ) 
computation, as it involves inversion, or computation of the determinant, of [(-R T A^ _1 i?) _1 + 
C] or [N + RCR T ] respectively. The computational expense is due to the fact that we do 
not know the eigenbasis for either of these matrices, and computing this basis is generally 
an 0(N 3 ) problem. 

The method of (Oh et al. 1999 ) solves the likelihood in the spatial domain by evaluating 
the determinant with a Monte Carlo algorithm. The method involves conjugate gradient 
descent to solve a linear problem, and as such involves matrix multiplication, which carries 
an 0(N 2 ) expense. The method was proposed and numerically demonstrated in the context 
of uncorrelated noise, and a region of sky coverage of azimuthal symmetry. However, the 
algorithm is in fact more general, provided there is sufficient memory for storage of the 
needed matrices, and that conjugate gradient descent converges quickly enough (i.e. there 
is a good preconditioner). 

For a certain class of observing strategies, we can exactly compute the likelihood, and 
for perturbations about these cases, we can use the approximate case as a preconditioner 
(as suggested in (Wandelt and Hansen 2001 ) ). The approach formulated in this paper 
provides a consistent way to do this, and involves a Monte Carlo Markov Chain approach to 
numerically marginalizing over the unobserved part of azimuthally symmetric regions of the 
sky. 
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2.2. Embedding the Data - Marginalization 



The method to be developed in this paper involves embedding the data in a region for 
which the signal and noise matrices have desirable properties. The likelihood for the data 
in the context of some model is given as an integral over the part of the embedding region 
which was not observed. This gives the identity for the Bayesian posterior for the power 
spectrum or cosmological model (denoted by T), given the time domain data d(t) as the 
integral 



P (r \d) 



(1) B (2) 
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p( S W,s^,f\T ,d) 



(9) 



_p(r |sM,s (2) ). 

where we have explicitly written the CMB maps s = (s^\ s^) in terms of the part of the 
sky where we have data and the complementary region s^ 2 \ For the case of full sky 
coverage and prior knowledge q(T), the log posterior ratio is given as 



, p(F\s^,s^) , q{T) ^ n 



where the power at a given multipole order is 

c,(s (i) ,s (2) ) = — !— - y 
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For other regions with azimuthal symmetry (an annulus or polar cap), the posterior ratio 
would involve a similar form in terms of block diagonal matrices. The simple form of the 
log posterior ratio for full sky coverage is one of the motivations for treating the problem 
of partial sky coverage as missing data, and marginalizing over it (detailed and justified in 
Appendix A.3). 



2.3. Maximization 



In order to estimate the power spectrum, we would like to find T which maximizes the 
posterior given the noisy data. Differentiating the posterior with respect to the parameters 
Ci gives a gradient in the direction 



d log p(T\d) 



T 



oc £(1 + 1/2) 



E[Q\y,T ] 1 

Q 2 (r ) c,(r ) 



(12) 



where E[Ci(s)\T ,d] was given previously in equation 2. An improvement of our current 
estimate of the power spectrum r can then given according to a Newton-Raphson iterative 
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scheme ( (Bond et al. 1998) and (Oh et al. 1999 ) ) , where our current guess is updated 
using an approximation to the curvature of the likelihood F^, 1 [d, T n ] , according to 



(13) 

r„ 



In Appendix B, it is shown that the curvature matrix is given in terms of the expectation 
value 

EiQQ^To] = J ds Ci{s)C v {8) p(s\d,T ) (14) 

In practice, we might want to avoid computing the inverse of the curvature matrix, and 
simply use the diagonal elements. 

For this paper, we instead implemented the simpler (although more slowly converging) 
expectation maximization algorithm (Dempster et al. 1977). This method essentially follows 
from Jensen's inequality, giving the lower bound to the posterior 

* W - V + 1/2) l E[Q(s)M (m-ck)+ 

(15) 

For a uniform prior, the lower bound is maximized by E[Ci(s)\T ,y]. In Appendix B, we 
prove that this estimator iteratively converges to the maximum of the posterior, and is a 
consistent and unbiased estimator (for a uniform prior). 



2.4. Computing Expectation Values for the Power Spectrum 

In order to iteratively converge to the optimal, consistent estimator of the power spec- 
trum, we need to compute, for any current guess of the power spectrum T , the expectation 
value E[Ci(s)\y, T ] as defined in equation 2. Defining the mean field map 

s=[N- 1 + C- 1 (T Q )}- 1 N- 1 y (16) 

and associated power spectrum estimate 

q(«) = ^Eik«ii 2 ( 17 ) 

m 

the expectation value can be written by integrating over fluctuations about the mean field 
map £ = s — s as 

1 r e-m^+c-H^nM 

£[C,(s)|j/,r n ] = / dt{s + S\lm){lm\s + Z) e -z> [N -i + c-Hr n )]? 

(18) 
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Since E[(lm\£)} = 0, the expectation value is 

We refer to these two terms as the mean field map power spectrum estimate (known to be 
biased) and the correction term. Analytically, we know that the correction term is given by 
(2/ + l) -1 J2 m {lm\ [N^ 1 + C _1 (r„)] _1 |/m) however, it is intractable to compute and store the 
matrix [iV -1 + C _1 (r n )] _1 . Our strategy is to compute the correction term with a Monte 
Carlo method described below. 



2.5. Transformed White Noise Sampling 

Due to the computational intractability of computing the matrix inverse (iV -1 + C~ 1 )~ 1 , 
our strategy is to compute the expectation value of the correction term from fluctuation 
maps £ sampled from the zero mean Gaussian random field with covariance matrix (iV -1 + 
C -1 ) -1 . We could easily sample fluctuation maps £ if we could compute the eigenvectors and 
eigenvalues of the matrix (A r ~ 1 + C~ 1 ), since in this basis the Gaussian probability density for 
£ factors. However, computing the eigenvectors and eigenvalues is again an 0(N 3 ) operation. 

Because of these difficulties, we look for an alternative way to sample maps. Defining 
5 = (iV -1 + C _1 )£, we can write the log density, up to the normalization constant, as 

-^(N- 1 + C- r )Z = -SiN' 1 + C- 1 Y 1 5 (20) 

The transformed Gaussian process has the covariance matrix N^ 1 + C _1 , making it easy 
to sample from. Specifically, we can sample maps from this Gaussian process by drawing 
two independent white noise maps (ui,U2), and setting 5 = C^^ujx + TV -1 / 2 ^. Since both 
white noise maps are drawn independently from a zero mean Gaussian process, the resulting 
covariance matrix is E[S ® 5} = N^ 1 + (as discussed in Appendix C). 

The maps with the correct statistical properties are £ = (A^ 1 + C _1 ) _1 5, which can be 
solved numerically for a given map 5. A numerically stable implementation involves setting 
£ = (I + CN~ 1 )~ 1 C5 (as also noted in (Oh et al. 1999 )), and using conjugate gradient 
descent to solve 

(I + CN- 1 )^ = C +1/ V + CJV- 1/2 W2 (21) 

The resulting maps £ have the correct statistical properties, since E[£ <E>£] = (A^ 1 + C -1 ) -1 
(see Appendix C ), allowing us to compute the correction term to the power spectrum 
estimate of the mean field map as a sample average. 
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In order to actually sample fluctuation maps by transforming a Gaussian white noise 
process, we need to obtain the Cholesky decomposition of both the signal and noise matrices. 
If we have observations with uncorrelated noise on the sky, then N~ l l 2 is known in the spatial 
domain. However, the scan strategy of the instrument will result in complicated correlations, 
so that computing N^ 1 ^ 2 is intractable. The noise is simple in the time domain, which 
suggests that instead of choosing white noise maps in the spatial domain, we instead draw 
from white noise Gaussian processes in the time domain, where we know N~ 1 / 2 in the Fourier 
basis, followed by a transformation to the sky, where we can operate with the signal matrix 
C. 

For a realization of a white noise process in the time domain r and a white noise map 
in the spatial domain u, we can compute a fluctuation map according to 

[/ + CR T N~ l R]i = C 1/2 uj + CR t N~ 1/2 t (22) 

where N~ l l 2 is known in the Fourier basis associated with the time domain. In Appendix C, 
this procedure is justified with a proof that the covariance matrix of the fluctuation maps is 

E[i®i] = [C- l + {R T Nr 1 R)- 1 ]- 1 - 

2.6. Computational Expense 

The overall computational expense is fixed, for each iteration of the power spectrum es- 
timate, by the expense of matrix multiplication and number of iterations needed to converge 
with conjugate gradient descent. In order to multiply by the matrix CR T N~ 1 R we need to: 

• Transform to the time domain with the matrix R. 

• Compute a time domain FFT. 

• Multiply by iV -1 - this is a diagonal matrix in the time domain Fourier basis 

• Compute a time domain inverse FFT. 

• Transform back to the spatial domain with R T . 

• Compute a spherical harmonic transform. 

• Multiply by C - this is a diagonal matrix in the spherical harmonic domain if the 
embedding region is the full sky. 
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For the case of circularly symmetric beams, the convolution with the beam is not needed 
when operating with the matrix R or its transpose, giving an expense KO[N^ 2 + N t log N t ], 
where N t is the number of time samples, and K is the prefactor related to the convergence 
rate of conjugate gradient descent. For cases where the beam is not circularly symmetric, the 
convolution with the beam would have to be computed, increasing the expense to KO[N 2 + 
N t log N t ]. 



3. Numerical Example 

The simulations presented here involve the assumption of spatially uncorrelated, but 
non- homogeneous, noise, as shown by the upper left in figure 1. We also restrict the problem 
to power spectrum estimation from a small patch of sky, and neglect curvature (and there- 
fore work with discrete Fourier basis instead of spherical harmonics). Our goal with these 
numerical simulations is to demonstrate the approach in action. Future work will involve 
numerical implementations on the sphere. 

A CMB power spectrum was generated using CMBfast (Seljak and Zaldarriaga 1996), 
followed by the creation of a full sky map on the sphere using the synfast routine in the 
HEALPIX package (Gorski et al. 1998). A smaller patch of the sky was then selected, and 
projected on a rectangular grid. This map s(n) was taken to be the noise free map, as shown 
in the upper right of figure 1. We then generated a noise map 77(71) by selecting independently 
at each pixel a Gaussian random number with variance scaled as shown according to the 
upper left of figure 1. Noise was added to the noise free map, and data then removed in a 
rectangular hole (as shown in figure 1). This was taken to be a simulated data set y = s + rj 
with partial coverage of the rectangular patch of sky. 

The inverse noise matrix was given in terms of the variance at the ith pixel a 2 as 

Sija~ 2 for the observed region of sky , . 

lj elsewhere 

As an initial estimate of the power spectrum, we computed the the power spectrum of the 
noisy, incomplete data (as computed in the two-dimensional Fourier basis since we neglected 
curvature) and subtracted the power spectrum of a single simulated noise map (on the 
region of sky where we have data). We then iteratively adjusted the power spectrum with 
the expectation maximization as above until convergence. 

We found that preconditioning was in fact necessary to achieve a reasonable convergence 
rate. Using the initial estimate of the power spectrum C(T ), and computing the diagonal 
elements of the inverse noise matrix in the Fourier domain (/c|A^ _1 |A;) with Monte Carlo noise 
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maps, gave the preconditioner 

M l : k ) = 5 kk ,[I + C kk (T )N kk 1 ]- 1 (24) 

Conjugate gradient descent was then used to solve the linear equations for the mean field 
and fluctuation maps 

M~ l [I + CN~ l }s = M- l CN~ l y 

M' l [I + CN~ l }i = M~ l 5 (25) 

where 5 = C +1 / 2 uo\ + CN~ l l 2 uo 2 was computed from two independently chosen spatial white 
noise maps (uji,lj 2 ), and N~ l l 2 vanished in the unobserved part of the sky, and elsewhere 
given by o~ x . The result of iterating the algorithm to convergence is shown in figure 2. 
Uncertainties in the power spectrum estimate were computed by Monte Carlo, in which new 
CMB maps were generated, noise added, and the algorithm run again. 

4. Additional Problems 

The methods presented above can be generalized to handle other additional problems 
faced in the Bayesian analysis of the CMB. We do not provide numerical examples, but 
briefly include comments on how to use transformed white noise sampling to estimate error 
bars and include foregrounds. 

4.1. Confidence Intervals from a Markov Chain 

The ability to sample maps of the CMB given some estimate of the power spectrum 
can be used to construct a Markov Chain Monte Carlo algorithm which converges to the 
Bayesian posterior p(T\d) itself. Previously, Markov Chain Monte Carlo techniques have 
been proposed for the extraction of marginal densities for cosmological parameters from 
approximate Bayesian posterior densities for the power spectrum ( (Christensen et al. 2001) 
, (Knox et al. 2001) , (Lewis and Bridle 2002) , (Runbino-Martin et al. 2002) ) 

We want to construct a transition matrix for the joint density of CMB maps and the 
power spectrum. This can be done by following the Metropolis Hastings algorithm, where 
we first assume detailed balance 



P (r\, Sl |y)T(r 2 , s 2 |r 1} s i;y ) = t(t u Sl |r 2 , s 2 , y ) P (r 2 , s 2 \ y ) 



(26) 
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From the condition of detailed balance we see that 

p(T\y) = J d(s, T 2 , s 2 ) T(r, s|r 2 , s 2 ; y)p(r 2 , s 2 |y) (27) 

which shows that the Bayesian posterior is the marginalized equilibrium density, generated 
by repeatedly taking steps generated with the transition matrix T(F 2 , s 2 |Fi, Si, y). Given 
any approximation to the joint density p(T 2 , s 2 \y), repeated application of the transition 
matrix will reach the equilibrium density. 

We can construct the transition matrix as follows. We first assume the proposal density 
is independent of the past p(T 2 , s 2 |Ti, si; y) = p(T 2 , s 2 \y) and given by 

p(r 2 , s 2 \y) oc e -(d-Rs)N- Hd -Rs)-sC-Hr 2]sp{r2ly) (2g) 

where p(T 2 \y) is any approximation to the Bayesian posterior itself. We therefore have, as 
before, the conditional density 

e -(d-Rs)N- 1 (d-Rs)-sC- 1 [r 2 ]s 

p(s\T 2 ,y) = ^ e _ (d _ Rs , )A r-i (d _ Rs / ) _ s , c .-i [r2 ] s , (29) 

and the marginal density for the power spectra given by our approximation p(T 2 \y). With 
this choice of proposal matrix, the acceptance matrix can be chosen to be 



a[s 2 , T 2 |si, Ti] = min 



x p(r 2 ,s 2 |y)p(ri,si|y) 



p(T 1 ,s 1 \y) p(T 2 ,s 2 \y)_ 
What makes this computationally tractable is that we know the ratios 

P(r 2 , S 2 \y) e -(d-Rs2)N- 1 (d-Rs 2 )-s 2 C- 1 lT 2 ]s2-log\\C(T 2 )\\ q ^ T2 



(30) 



p(T u Si\y) e ^(d-Rs 1 )N-^d-Rs 1 )- Sl c-^r 1 ]s 1 -iog\\c(r 1 )\\ q \jp^ 

as well as 

p(ri,Si|y) _ e -(d-Rsi)N-Ud~Rs 1 )~s 1 C-HT 1 ]s 1 p^ Ti ^ 

p(T 2 ,s 2 \y) ~ e -(rf-^2)JV- 1 (d-fe2)- S2 c-Mr 2 ] S2 p( r2 | y ) 
which simplifies to the explicitly computable acceptance probability 



a[s 2 , r 2 |si, Ti] = min 
In summary, a single step of the Markov chain involves: 



e -iog||c(r 2 )|| g[r2 ]p (ri | y) 
'e-iogliccroii^r^r^) 



(31) 



(32) 



(33) 



Choose a new guess for the power spectrum from p(T 2 \y). 
Sample a map from p(s 2 \T 2 ;y) according to 
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- Compute the mean field map s 2 = [R T N~ l R + C- l (T 2 )}- 1 R T N~ l y. 

- For two independently sampled white noise maps in the spatial and time domains 
(u, r), compute f as the solution to (I + CR T N~ l R)^ = C +1/ V + R t N- 1 I 2 t. 

- Set s 2 = s 2 + £■ 

• Accept the transition (r^Si) — > (r 2 , s 2 ) with probability a[r 2 , s 2 |Fi, Si! y]. 

• Continue 



For circularly symmetric beams, each step of the Markov chain has expense KO[N 3 ^ 2 + 
N t logN t ], giving a total expense N M cA~^ c KO[N 3 / 2 + N t \og N t ], where N MC is the number 
of realizations of (r, s), and A~^ c quantifies the "efficiency" of the Markov chain (suggestively 
denoted by A^ 1 since we intuitively expect it to vary inversely with the average acceptance 
probability) . 

By the efficiency of the Markov chain, we mean the number of proposed moves that 
must be made until one is accepted. This efficiency depends entirely on how good the 
approximation to the posterior p(T 2 \y) actually is. We can always write the acceptance 
matrix as 

p{s 2 \r 2 ,y)p(r 2 \y)p(s 1 \r 1 y)p{r 1 \ y y 



a[s 2 , F 2 | Sl , r x ] - min p 
However, by construction, this is exactly 



a[s2, r 2 |si, Ti] = min 



j p( r 2\y)p(Ti\y) 

. 'p(ri|j/)p(r 2 |y) 



(34) 



(35) 



so that if our approximation was exact, the acceptance matrix is unity for all proposals, 
and we reduce to an expense NmcKO [N 3 ^ 2 + N t log N t ] . Provided that we have reasonable 
approximations to the posterior, we can converge to the exact posterior with this method, 
and the use of transformed white noise sampling. 

We also briefly comment on some practical issues with this approach. In order to run the 
Markov chain, we need to compute the mean field map for every proposed power spectrum. 
While this is feasible, we can speed up the Markov chain by sampling a grid of power spectra 
from our approximation p(T\y), and pre-computing the mean field maps for each T on our 
grid. Then, for any proposed power spectrum, we can start with the initial condition for 
conjugate gradient descent with the closest grid point. 
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4.2. Inclusion of Foregrounds 

We conclude with a brief discussion of the inclusion of foregrounds. The inclusion of 
the foregrounds involves sampling both CMB and foreground maps given some estimate of 
the CMB power spectrum and some prior for the foregrounds. For a Gaussian prior for the 
foregrounds, the same approach to sampling can be followed. 

Including foregrounds the data for the jth frequency channel is given by 

dj it) = A j0 Rj s + Y^ A ip R j fp + Vj (36) 
p 

where Rj is the mapping from the sky to the time domain for the jth frequency channel, Aj 
is the response of the CMB at the jth frequency, and we sum over the foreground components 
f p . For a statistical characterization of the foregrounds /3 and a guess of the CMB power 
spectrum T, we need to sample from the density given by (up to normalization) 

-\ogp(s,f\d,T ,P) ~ (d-AoRs-ARf)N' 1 (d-AoRs-ARf)-sC' 1 (T )s-\ogp(f\(3) (37) 

We need to compute the expectation value 

£[C,|d,r ,/3] = j d{s,f)C l {s)p{s,f\d,To,(5) (38) 

This expectation value can be numerically computed by sampling maps (s, /) by the time 
average of a Markov chain with equilibrium density p(s, f\d, T , (5). One legitimate way to 
construct such a Markov chain is to alternately sample maps from the conditional densities 
p(s\f, d, T , (3) and p(f\s, d, T , (3). We briefly comment on sampling maps from each of these 
conditional densities. 

Given some estimate of the foregrounds, we need to sample maps from the conditional 
density 

-\ogp{s\f,d,To,(3) ~ (d- A Rs- AR/)iV _1 (d- A Rs - ARf) - sC" 1 (r )s (39) 

Sampling from p(s\f, d, T , (3) proceeds in essentially the same way as described above, but 
generalized for multi-frequency data. Specifically, the mean field map includes the subtracted 
response from the foreground estimate 

s = [C- 1 + A R T N- 1 RA }- 1 AoR T N- 1 (d - Af) (40) 

and the fluctuation maps include a time domain white noise sample for each frequency 
channel 

(/ + CR T AoN~ l AoR)i = C^V + R J A 0J N i lr j ( 41 ) 

3 
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Note that the covariance matrix of the fluctuations is 

E[£ ® £] = (C" 1 + IFAoN-'AoR)- 1 (42) 

where the proof depends on the independence of the time domain white noise maps for every 
frequency channel. 

Given some estimate of the CMB, we need to sample from the conditional density 

- logp(f\s, d, F , (3)~{d- A Rs - ARf)N-\d - A Rs - ARf) - logp(/|/3) (43) 

If the prior for the foregrounds p(f\P) is Gaussian 

-\ogp{f\(3)~f B -\(3)f (44) 

then we can also use transformed white noise sampling. First, we compute the mean field 
foreground map, 

/ = [B- 1 + R T AN~ l AR}~ 1 R T AN~ l (d - A s) (45) 
and then sampling fluctuations £ p according to the transform of white noise samples 

(/ + BR T A p N~ 1 A p R)C P = 5 V V + ^ RjA^N' 1 ^ (46) 

3 

Multiplying by the foreground signal matrix B is either done in the pixel or spherical har- 
monic basis depending on the basis in which it is sparse. 

If we use a non-Gaussian prior (such as the maximum entropy method, or other priors), 
then we will need to employ more general sampling techniques to sample foreground maps, 
such as the Metropolis algorithm. In this case the needed expectation value is to be computed 
as the time average from a Markov chain with equilibrium density p(s, f\y, T n ). 

5. Conclusions 

The fundamental hurdle to numerically implementing an exact Bayesian approach to 
CMB analysis, including complications of partial sky coverage, correlated noise, and fore- 
grounds, is finding efficient ways to solve the linear problem (I + CR T N^ 1 R)^ = 5 for any 
vector 5. Solving the linear equation has an expense KO[N 3 ^ 2 + 7V t logiV t ] for circularly 
symmetric beams, and the algorithm provides a tractable approach provided K can be made 
small enough. The strategy we presented in this paper allows the data to be embedded in 
an azimuthally symmetric region of the sky covered by a Wandelt ring set scan, with the 



-16- 



intuition that, provided the true scan of the instrument is close enough to the exact scan, 
we inherit good preconditioners. 

We also commented on how the method of transformed white noise sampling can be 
used in Monte Carlo Markov Chain for the entire Bayesian posterior. The feasibility of 
this approach depends on a good approximation to the posterior itself. Previous work has 
demonstrated several computationally feasible, unbiased estimates of the power spectrum 
and associated error covariance matrix. Any of these methods could therefore be used, in 
principle, to give an approximate posterior, so that a Markov chain approach can be used as 
a final consistency check. 

Future work will incorporate the foregrounds in the algorithm presented here, general- 
ized for multifrequency data. Maximization of the likelihood of the power spectrum given 
the data again leads to the computation of the expectation value E[Ci\d, r ], but now the 
marginalization includes the foregrounds as well. If the prior for the foregrounds is Gaus- 
sian, then we can also use transformed white noise sampling to sample a new foreground map 
while conditioning on the CMB. If the prior used is non-Gaussian, other sampling schemes 
can be used, including Gibbs sampling or the Metropolis algorithm. 

This research was carried out at the NASA Jet Propulsion Lab, under NASA AISRP 
Grant, and support from the Long Wavelength Center. We also thank Eric Hivon and Ben 
Wandelt for interesting discussions during the course of this work. 



The data returned from a CMB experiment is a vector d(t) in the time domain, gen- 
erated from scanning the CMB signal s(n) and foregrounds f(n) on the sky and adding 
independent Gaussian noise. The Bayesian posterior is given directly as an integral over 
unknown quantities by 



For the case of Gaussian random fields, this integral can be done analytically (as will be dis- 
cussed in Appendix A. 2 ), but evaluation of the resulting likelihood leads to computationally 
intractable matrix manipulations. For any estimate r 



A. Identities for the Bayesian Posterior 



A.l. Identity for the Posterior 




(Al) 




' p(T,s,f\d) ' 

A^,sj\d)_ 



p(T ,s,f\d) 



(A2) 
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Since we have 



this becomes 



p(T\d)=p(T \d) / d(s,f) 



p(r,s,f\d) _ p(T\s,f,d) 
p(T ,s,f\d) p(T \s,f,d) 

p(T\s,f,d) 



p(s,f\T ,d) 



(A3) 



(A4) 



.p(r |s, /, d)_ 

where we have used p(T , s, f\d) = p(s, f\T , d)p(T \d). By the assumed independence of the 
noise, the joint density over which we are marginalizing is 

p(r, s, f\d) oc p(d\s)p(s\T)p(f\[3)q(T) (A5) 

where q(T) is a prior for the parameters, p(f) is a prior for the foregrounds, and p(d\s) 
is completely determined by the noise properties of the instrument, the beam, and scan 
strategy. Given some estimate of the noise free CMB signal, the density for a new guess of 
the power spectrum is independent of the data, p(T\s, f,y) = p(T\s), as shown by 

p(d\s)p(s\T)p(f)q(T) 



P (T\s,f,d) 



Jdr p(d\s)p(s\r) P (f)q(r) 

p(s\T)q(T) 



J dV p(s\r) q (r) 

Therefore, for any estimate T , our identity now reads 



P (r |d) 

or for the likelihood ratio 

p(d\T) 

P (d\r ) 



d(s,f) 



d(s,f) 



P( r ok) 

p(s\T) 
P( s \ T o) 



p(s,f\r ,d) 



p(s,f\r ,d) 



(A6) 



(A7) 



(A8) 



A. 2. Transformations of the Data and Associated Likelihood 

The likelihood of the data, generally in the time domain for some experiment, is 

P (d\C) OC J ds e ~{d~Rs)N-Hd-Rs) e -sC^s (AQ) 

We first consider the case of transforming the time ordered data to a map according to 
the transformation s = (R 7 'iV -1 /?) -1 /? 7 'N~ 1 d. The likelihood is the probability density 
evaluated at the data. Therefore, we first need to find the probability density of maps given 
by transforming the density for time ordered data, according to 

p[s\d\ = I ' 5d 5[s - (R T N~ l R)~ l R T N^d] [ ds e -(d-Rs)N-^ d -R S ) e -sC-^ (AlQ) 
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This is equivalently 

p[s\Q] oc J 8d8[s-{R T N- 1 R)- 1 R T N- 1 d\ J ds e -(d-Rs)N-^d-R S ) e -sC~h 



5d J dk e -ik-s e+ iHR T N^R)-^N-^ J dg e -(d-Rs)N-i(d-Rs) e -sC-i S 

= J dk e~ ik - s J ds e- sC ~ ls J 5d e +*HR T N-iR)-iRTN-^ e -( d -Rs)N-Hd~Rs) 

= J dk e~ ik S J ds e +ik ' s e~ sC '~ ls J Sd e + i HR T N- 1 R)- 1 R T N- 1 (d-Rs) e -(d-Rs)N- 1 (d-Rs) 

= J dk e -^ e -HR T N-^k J ds e+l k.s e -sC-is 

= J dk e~ ihS e~ k(RTN ~ lR) ~ lk e~ kCk 

= e -si(R T N- 1 R)- 1 +ci- 1 i ^ A1 ^ 



Notice that there are many time series vectors which correspond to the same spatial map, 
since we can add any vector such that R T N~ l d' = 0. This is possible since R T d vanishing 
means that averages at the same point vanish. 

We can also write the likelihood in terms of the data in the original time domain. This 
can be done by writing the characteristic function 



C(Jfe) = J 5d e~ tk - d J ds e -(d-Rs)N-Hd-Rs) e -sC^s 

' dse~ sC ' ls ( 5d e~ ik - d e -(d-Rs)N-\d-Rs) 



which shows that 



J ds e -^-Rs e sC-^ S J Sd e -ik-{d-Rs) e -(d-Rs)N- 1 (d-Rs) 
e -kNk J ds e -ik-Rs e -sC^s 

e -k(N+RCR T )k ( Al2 ) 

p(d\Q) oc e -^+RCRT)-^ (A13) 



A. 3. Partial Sky Coverage 



The likelihood for the CMB given the theory for partial sky coverage can be written as 
the marginalization over the unobserved part of the sky. Denoting the CMB s = (s^\ s^) 
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as the CMB in the observed and unobserved regions of sky respectively, we have 

Therefore, the posterior for partial sky coverage can be written 
p(T\y) oc q(T)Jd(s^J)p(y\s^J)p(s^\T)p(f\(3) 



q(T) I d(s W ,f) p(y\s [1) ,f) 



(i) 



ds^ P (s (1) ,s (2) |r) 



This gives the identity, explicitly written for arbitrary sky coverage, 



p(£\y) 
p(r |y) 

„(2) 



p(t\s 



(i) »<?y 



.P( T o\s 



(1) c(2)^ 



p(s^,s^,f\r ,y) 



Because s = (s { s { ') is supported on the full sky, the log posterior ratio is 



log^ = log^-B^V2) 



1 



Q(T) c,(r ) 



log 



Ci(T) 

c,(r ) 



>(r |s) g(r 

and the conditional density from which we are to sample (s^\ s^) is 

- \ogp(s^\ s^\T , y)~{y- s^)N~ l {y - S W) + s^CT 1 ^)^ 1 ), s< 2 )) 



(A14) 



= qiT) J d{s^\s^\f)p{y\s^\f)p{s^\s^\V)p{f\(3) (A15) 



(A16) 



(A17) 



(A18) 



This is equivalent to setting the inverse noise matrix to zero for the part of the sky where 
there is no data. 



A. 4. No Data and No Noise Limits 



Two limiting cases of our identity involve "no data" and "no noise". In the no data 
limit, the inverse noise matrix vanishes everywhere, so that p(s\y,T ) — > p(s|r ) and the 
posterior is given by the prior, as shown by 



g(£jy) 
p(r |y) 



ds 
ds 

g(r) 
?(r ) 
gg) 
?(r ) 



.p( r ok). 
p(s|r) 9 (r) 



_p(s|r )g(r ) 



rfsp( s |r) 



p(s|r ) 
p(«|r ) 



(A19) 
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In the noise free limit, the conditional density p(s\y,T ) — > 5(s — s true ), independent of our 
choice of T , so that we converge to the noise-free posterior ratio. 



B. Properties of the Power Spectrum Estimator 
B.l. Posterior Maximum 

Recall the the algorithm used involves iterating 

c,(r n+1 ) = J E?[c I |r n ,y] 



(Bl) 



and therefore the fixed point satisfies Ci(T) = E[Ci\T,y]. We can prove that this estimator 
maximizes the posterior, or equivalently the log posterior. A direct calculation shows that 



8 p(T\y) 



dQ(T) p(T \y) 
so that at the maximum 



P(I>) 

p( r ok 



-p(s\y,r ) (B2) 



° = J ds [» + V2)Q( S )(^-^) 



p(s\y, r ) 



(B3) 



At the maximum we therefore have 



c,(r ) 



ds Ci(s)p(s\y,F 



(B4) 



which is identically the fixed point of the expectation maximization algorithm. Therefore, 
the iteration converges to the maximum of the posterior (for a uniform prior). 



B.2. Estimating the Curvature of the Likelihood 

After we have computed the maximum likelihood estimator of the power spectrum (or 
maximum posterior estimate), we want to find a confidence interval. There are several ways 
this confidence interval might be approximated. One approach is to compute the inverse 
curvature matrix of the likelihood, and take the diagonal entries as an estimate of the error 
bars (refs). We comment below on how this can be done with the same method used to 
compute E[Ci\y,T}. 
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Approximating the likelihood as a Gaussian functional of the Ci(T) is equivalent to a 
second order Taylor expansion of the log likelihood about the maximum. As before we have 
the identity for the likelihood 

p(d\T) _ f da Pm pmTo) (B5) 



p(d|r ) J p(s|r„) 

It is more convenient to parameterize the likelihood ratio in terms of Q\ = Cf 1 , so that when 
embedding the data on the full sky, we have 

"* |S = " D' + V2) ([*(!■) - *(r„)]C7, W - log m) (B6) 

Denoting the curvature matrix of the likelihood 

F„,[d,r ] = — dQdQi (B7) 

we have the relation 

-f H / [tZ,r ] - Q2(ro) ^ cf/{To) (B8) 

The curvature of logp(d|r) evaluated at the maximum, where Ci(T ) = E[Ci\d,T ] is there- 
fore 

d 2 \ogp(d\T) a + Mo\r*(v\ 

mWv ~ -M* + (r ) 

+(i + i/2)(z' + 1/2) [£[c,c,,|d, r ] - c,(r )c,,(ro)] (B9) 

where we have defined the expectation value 

E[QQ,\d,r } = J di Q(s + OQ,(s + ^ - g _ (BIO) 

This can, in principle, be computed with the conjugate gradient descent method of transform- 
ing samples from a white noise process. Future work will study the accuracy and convergence 
properties of estimating the curvature matrix from transformed white noise sampling. 



B.3. Covariance Matrices 



The correctness of the algorithm for power spectrum estimation presented in this paper 
is established by proving that the covariance matrix of two linearly transformed vectors 
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has some specific form. For simplicity of notation, we choose to denote the covariance 
matrix of two vectors (x, y) as the expectation value of the outer product of the vectors 
E[xiUj] = E[x <g> y]ij. For two matrices A and B, an identity used repeatedly in computing 
covariance matrices is 

E[(Ax) <g> (By)] = AE[x <g> y]B T (Bll) 
which is shown simply by checking for each matrix element 

E[(Ax) <g> (By)] mn = J2 E l A mi x i B njyj} 

ij 

= J2A mi E[x iyj ]Bj n (B12) 

ij 

One example of this identity is in computing the expectation value of maps computed from 
time ordered data. One form of making a map from time ordered data is given by 

s = (R T N- 1 R)- 1 R T N- 1 d (B13) 

as mentioned in section 2 and in (Tegmark 1997). According to the identity above, the 
covariance matrix of this map is 

E[s®s\ = (R T N- 1 Ry 1 R T N- 1 E[d^d}(R T N- 1 ) T [(R TN ' 1 Ry 1 } T ( B14 ) 
Substituting d — Rs + rj, this is 

E[s®s] = (R T N- 1 R)" 1 R T N- 1 E[(Rs + r])®(Rs + r])]N- 1 R(R T N- 1 R)- 1 
= (B^N^R^B^N- 1 (RE[s ® s}R T + RE[s <g> 77] 

+E[r] ® s]R T + E[r] ® 77]) N' 1 R(R T N' 1 Ry 1 
= (R T N- 1 R)- 1 R T N- 1 (RCR T + N) N' 1 R(R T N' 1 R)' 1 
= C + (R T N~ l R)- 1 (B15) 

where we have used the independence of the signal and noise, and also that both are zero 
mean processes. This result is also proven in Appendix B below using the characteristic 
function of the transformed time ordered data. 



B.4. Consistency 

The estimator given by the expectation maximization algorithm (also equivalent to the 
maximum likelihood estimator) is given by 

, e -(y- Rs ) N -l(y^ Rs )^ sC - 1 S 

E[S <g) S\y] = J ds[s® S ] J rfg/ e -(y-Rs>)N-Hy-Rs>)s>C-is> ^ 
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For partial sky coverage, this is equivalent to embedding the data on the full sky and 
marginalizing over it 

f e -{yW-s)N- l {yW-s) -(y-Rs)N- 1 (y-R8)-8C- 1 8 

Els® sly] = / d(s,y i2) ) \s®s] = (B17) 

1 |yj J K ,y r d(s',y') e -(y'- s ') N (y'- s ')e-(y- Rs ') N (y- Rs ')- s ' c s ' 

where we have arbitrarily chosen the full sky inverse noise matrix 

FFN^R 



N' 1 - 

A ! 

The expectation of the covariance matrix is then 

e -(j/ 2 > -s)^- 1 (y( 2 ) -s) e -(y-Rs)N- 1 (y-Rs)sC- 



(B18) 



E[E[s®s\y]] = j dy J d(s,y {2) ) [s®s] 



J d(s', y') e-iy'-^N-Hy's^g-iy-Rs^N-^y-Rs^-s'C-^s' 

(B19) 

Consistency of the estimator is then shown by proving that C = E[E[s <g> s\y]\. 

Using the augmented noise matrix (which now has an inverse on the full sky), we can 
define the mean field map s = [iV _1 + C~ l ]~ l N~ l {y, y^), and write 

E[E[s®s\y]] = {N- 1 + C- 1 )- 1 N- 1 E[y®y}N~ 1 {N~ 1 + C- 1 )- 1 
= C{N + C)'\N + C){N + C)- 1 C 

= CiN + C^C (B20) 

The expectation value of the correction is (iV -1 + C -1 ) -1 , so that 

E[E[s®s\y]] = C(N + C)- l C +(N~ l + C- 1 )- 1 
= C{N + C)- 1 C + N{N + C)- 1 C 
= C (B21) 

Therefore, the expectation maximization algorithm converges to the maximum likelihood 
estimator for a uniform prior, which is also a consistent estimator for arbitrary sky coverage. 



C. Correctness of Transformed White Noise Sampling 

As derived in section ? above, the algorithm to sample maps from the Gaussian process 
with covariance matrix (iV -1 + C -1 ) -1 is 



Draw (u>i,u>2) from a white noise process. 
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• Compute 5 = C +1 ' 2 uj 1 + CN~ x l 2 u 2 . 

• Compute £ = (I + CN~ 1 )~ 1 5 with conjugate gradient descent. 

We can prove that for any AT -1 , even one which does not have an inverse (i.e. as in the case 
of partial coverage of the chosen embedding region), we have E[(; <g) £] = (iV" 1 + C -1 ) -1 . 
From the above we have 

f = (AT- 1 + C- 1 )- 1 ^- 1 / 2 ^' + N- 1/2 lo] (CI) 

which gives 

E[£ <g> f] = (N' 1 + C-^EKC-^uj' + N' 1/2 to) <g> (C^V + iV- 1/2 ^)](A^- 1 + C~ l y l 
= (N- 1 + C- 1 )- 1 {C- l ' 2 E[J <g> J\C- 1 ' 2 + C- l ' 2 E[J <g> uj]N- 1 ' 2 

+N- 1/2 E[lu <g) cu']C- 1/2 + N^E^ <g> uj]N- 1/2 ) (N- 1 + C- 1 )- 1 
= (AT" 1 + C- 1 )- 1 (C- 1 + N- 1 ) (N- 1 + C- 1 )- 1 

= (N^ + C- 1 )- 1 (C2) 

where by independence of the two white noise maps, E^fgiu/] = E[cj] (gJ-Ejo/] which vanishes 
since the white noise process is zero mean. An important point to notice is that the matrices 
A^ 1 or N" 1 / 2 can be singular in the sense that they do not have inverses on the full sky (i.e. 
are generated by incomplete scanning of the sky). In fact N~ l l 2 uo vanishes in the null space 
of A^ -1 (where we do not have data). 

For a realization of a white noise process in the time domain r and a white noise map 
in the spatial domain u, we compute a fluctuation map according to 

f = (/ + CR T N- l R)- 1 [C 1 ' 2 uj + CR t N-^ 2 t] (C3) 

where N' 1 ^ 2 is known in the Fourier basis associated with the time domain. We can prove 
that the maps £ have the correct covariance matrix, E[£ <8> £] = (R T N~ 1 R + C -1 ) -1 by the 
direct calculation 

E[£ <g) f ] = (R T N- 1 R + C- l )- l E[(C- l/2 uu + RFN'^t) <g> (C~ 1/2 uj + R t N~ 1 / 2 t)} (R t N~ 1 R + C^ 1 )" 1 
= (R T N~ 1 R + C- 1 )- 1 {C- l ' 2 E[oo ® uj]^ 1 ' 2 + +R t N^ 2 E[t ® r]]^ 1 / 2 ^) (i^AT 1 + C" 1 )- 1 
= (R T N- 1 R + C- 1 )- 1 (C- 1 + ^N^R) (^iV" 1 + C- 1 )- 1 

= (R T N- 1 R + C- 1 )- 1 (C4) 



where again, the cross terms vanish since independence gives -Eju; ® r] = ® -E^r], which 
vanishes since both are zero mean processes. 
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D. Embedding the Data in Azimuthally Symmetric Regions 

We can also find the likelihood for the data embedded in an azimuthally symmetric 
region covered by a ring set. As discussed in (Wandelt and Hansen 2001 ), we represent the 
signal on the ring set with coefficients a mm i, so that 

s(M) = E fl ™' e """ 9e "^ ( D1 ) 

mm' 

where both indices range from —L max < m < L max . Therefore, the signal on the sky, 
parameterized on the ring set, is given by a two-D inverse FFT. For any specific power 
spectrum, the corresponding signal matrix on the ring set covering the embedding region is 
block diagonal. We denote the ring set covariance matrix T = E[a <S> a}. 

Given the noise free signal on the ring set s(0,(f>), any observed data set is given by 
projecting into the sub-region where we have data, 

d(t)=Qs + ri(t) (D2) 

where Q vanishes on the regions of the ring set where we have no observations. As before 
we would first compute the mean field and fluctuation maps. In order to sample fluctuation 
maps, we would again compute, 

(I + TQ T N~ l Q)i = T +l l 2 u + CQ T N- l l 2 T (D3) 

where r is a time domain white noise process, u is a spatial white noise process on the full 
sky, T +l l 2 is the Cholesky decomposition of the ring set covariance matrix. 

Although it is possible to compute the Cholesky decomposition of the ring set covariance 
matrix, we might instead sample C 1 I 2 uj on the full sky, and then project into the region 
covered by the ring set, giving maps RC x I 2 uj. Then the covariance matrix for these maps is, 
using the usual identity 

E[(RC 1/2 uj) ® (RC 1/2 u)] = RC 1/2 E[u®uj}C 1/2 R t 

= RCR T (D4) 

which is the correct covariance matrix on the ring set region. The point is that we do not 
have to compute the Cholesky decomposition of the signal matrix on the ring set, but can 
instead transform white noise maps with the Choelsky decomposition of the full sky signal 
covariance matrix (diagonal in the spherical harmonic basis), and then project down to the 
ring set region. 
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We can construct a preconditioner as follows. Define the projection from the time 
domain to the ring set following a Wandelt scan strategy as W T N~ 1 W. The for the same 
time domain noise matrix iV _1 , we can use a preconditioner given by W T N~ l W and define 

M- 1 = [I + TW T N' 1 W}- 1 (D5) 

It is shown in (Wandelt and Hansen 2001 ) that W T N~ 1 W is block diagonal on the ring set, 
so that M _1 can be computed in 0(N 2 ) operations. We can then solve the linear equations 
for the mean field and fluctuation maps 

M-^I + TQ T N~ 1 Q]s = M~ l TQ T N~ l d 

AT 1 [J + TQ T N- 1 Q]i = M- 1 {T +1 ' 2 uj + TQ t N- 1 ' 2 t) (D6) 

For scans which are close to the ring set scan the intuition is that conjugate gradient descent 
will converge quickly for the above equations. 
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E. Figures 

• Figure 1 - The top left image shows the variance of the noise in each pixel. The black 
region is an unobserved "hole". The upper right plot is a noise free realization, and 
the lower left is a mean field map given the simulated noisy data. The bottom right 
shows a typical fluctuation map computed with conjugate gradient descent. 

• Figure 2 - Plot showing the power spectrum estimation after 20 iterations starting from 
a flat initial guess. The green dots represent the results of separate runs on each of 10 
simulated data sets, with each shade of green representing a different run. For each 
data set we have produced a new noise-free map (drawn from the theoretical power 
spectrum given by the solid black line), and added inhomogenous noise and a hole 
as shown in Figure 1. The initial guess used was the expected (flat) noise spectrum, 
multiplied by a random number between and 1. At low / the SNR is high, and the 
spread in the dots is caused by noise and sample variance. At high /, the SNR is low 
and the spread in the dots is caused by variation in the initial guess, resulting in an 
upper bound as shown in the plot. Also shown are sample power spectra for a single 
simulated data set (light blue line), and noise- free map (dark blue line). 
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